Data EDA¶

Fandango Movie Ratings¶

Highlight¶

First, let's import our basic libraries and set the option to display all columns in the data.

In [1]:
#common imports
import pandas as pd
import numpy as np

#Additional imports/optional
from pandas import Series, DataFrame 

#This suppress SettingWithCopyWarning 
pd.options.mode.chained_assignment = None  # default='warn'

# set display columns - Do not change or CodeGrade may not function correctly
pd.set_option('display.max_columns', None)

Exercise: Import the data from the fandango.csv file and save as a DataFrame called fandango. Look at the first five rows.

In [2]:
# load data from the fandango.csv file

fandango = pd.read_csv('fandango.csv')                      
fandango#.head(5)
Out[2]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
0 Avengers: Age of Ultron (2015) 74 86 66 7.1 7.8 5.0 4.5 3.70 4.30 3.30 3.55 3.90 3.5 4.5 3.5 3.5 4.0 1330 271107 14846 0.5
1 Cinderella (2015) 85 80 67 7.5 7.1 5.0 4.5 4.25 4.00 3.35 3.75 3.55 4.5 4.0 3.5 4.0 3.5 249 65709 12640 0.5
2 Ant-Man (2015) 80 90 64 8.1 7.8 5.0 4.5 4.00 4.50 3.20 4.05 3.90 4.0 4.5 3.0 4.0 4.0 627 103660 12055 0.5
3 Do You Believe? (2015) 18 84 22 4.7 5.4 5.0 4.5 0.90 4.20 1.10 2.35 2.70 1.0 4.0 1.0 2.5 2.5 31 3136 1793 0.5
4 Hot Tub Time Machine 2 (2015) 14 28 29 3.4 5.1 3.5 3.0 0.70 1.40 1.45 1.70 2.55 0.5 1.5 1.5 1.5 2.5 88 19560 1021 0.5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
141 Mr. Holmes (2015) 87 78 67 7.9 7.4 4.0 4.0 4.35 3.90 3.35 3.95 3.70 4.5 4.0 3.5 4.0 3.5 33 7367 1348 0.0
142 '71 (2015) 97 82 83 7.5 7.2 3.5 3.5 4.85 4.10 4.15 3.75 3.60 5.0 4.0 4.0 4.0 3.5 60 24116 192 0.0
143 Two Days, One Night (2014) 97 78 89 8.8 7.4 3.5 3.5 4.85 3.90 4.45 4.40 3.70 5.0 4.0 4.5 4.5 3.5 123 24345 118 0.0
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.00 4.05 4.50 3.65 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0
145 Kumiko, The Treasure Hunter (2015) 87 63 68 6.4 6.7 3.5 3.5 4.35 3.15 3.40 3.20 3.35 4.5 3.0 3.5 3.0 3.5 19 5289 41 0.0

146 rows × 22 columns

As you can see, this data contains the review scores for four review sites: Rotten Tomatoes, Metacritic, IMDB, and Fandango. These sites use different methodologies in terms of ratings, and Rotten Tomatoes and Metacritic have separate critic scores and user scores.

According to the 538 article, the columns with 'norm' in the column name are scores that have been normalized to the five-star rating scale that Fandango uses and the columns with 'norm_round' in the column name are those columns rounded to the nearest half-star.

Exercise: Use .info() to obtain more information about the columns, number of entries, number of missing values (if any), and the data type for each column.

In [3]:
#Use .info() to obtain more information

fandango.info()                        
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 22 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   FILM                        146 non-null    object 
 1   RottenTomatoes              146 non-null    int64  
 2   RottenTomatoes_User         146 non-null    int64  
 3   Metacritic                  146 non-null    int64  
 4   Metacritic_User             146 non-null    float64
 5   IMDB                        146 non-null    float64
 6   Fandango_Stars              146 non-null    float64
 7   Fandango_Ratingvalue        146 non-null    float64
 8   RT_norm                     146 non-null    float64
 9   RT_user_norm                146 non-null    float64
 10  Metacritic_norm             146 non-null    float64
 11  Metacritic_user_nom         146 non-null    float64
 12  IMDB_norm                   146 non-null    float64
 13  RT_norm_round               146 non-null    float64
 14  RT_user_norm_round          146 non-null    float64
 15  Metacritic_norm_round       146 non-null    float64
 16  Metacritic_user_norm_round  146 non-null    float64
 17  IMDB_norm_round             146 non-null    float64
 18  Metacritic_user_vote_count  146 non-null    int64  
 19  IMDB_user_vote_count        146 non-null    int64  
 20  Fandango_votes              146 non-null    int64  
 21  Fandango_Difference         146 non-null    float64
dtypes: float64(15), int64(6), object(1)
memory usage: 25.2+ KB

Exercise: Use .describe() to obtain a statistical overview of the data.

In [4]:
# Use .describe() to obtain a statistical overview of the data               

fandango.describe()
Out[4]:
RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
count 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000
mean 60.849315 63.876712 58.808219 6.519178 6.736986 4.089041 3.845205 3.042466 3.193836 2.940411 3.259589 3.368493 3.065068 3.226027 2.972603 3.270548 3.380137 185.705479 42846.205479 3848.787671 0.243836
std 30.168799 20.024430 19.517389 1.510712 0.958736 0.540386 0.502831 1.508440 1.001222 0.975869 0.755356 0.479368 1.514600 1.007014 0.990961 0.788116 0.502767 316.606515 67406.509171 6357.778617 0.152665
min 5.000000 20.000000 13.000000 2.400000 4.000000 3.000000 2.700000 0.250000 1.000000 0.650000 1.200000 2.000000 0.500000 1.000000 0.500000 1.000000 2.000000 4.000000 243.000000 35.000000 0.000000
25% 31.250000 50.000000 43.500000 5.700000 6.300000 3.500000 3.500000 1.562500 2.500000 2.175000 2.850000 3.150000 1.500000 2.500000 2.125000 3.000000 3.000000 33.250000 5627.000000 222.250000 0.100000
50% 63.500000 66.500000 59.000000 6.850000 6.900000 4.000000 3.900000 3.175000 3.325000 2.950000 3.425000 3.450000 3.000000 3.500000 3.000000 3.500000 3.500000 72.500000 19103.000000 1446.000000 0.200000
75% 89.000000 81.000000 75.000000 7.500000 7.400000 4.500000 4.200000 4.450000 4.050000 3.750000 3.750000 3.700000 4.500000 4.000000 4.000000 4.000000 3.500000 168.500000 45185.750000 4439.500000 0.400000
max 100.000000 94.000000 94.000000 9.600000 8.600000 5.000000 4.800000 5.000000 4.700000 4.700000 4.800000 4.300000 5.000000 4.500000 4.500000 5.000000 4.500000 2375.000000 334164.000000 34846.000000 0.500000

There is already interesting information that we can see from this statistical overview if we compare Fandango's star ratings with the other review sites in terms of the mean, standard deviation, and minimum.

Q1: The article begins by talking about the movie: Fantastic Four (2015). Index the fandango DataFrame and select only the row for this film. Save it as Q1.

In [5]:
# Index the fandango DataFrame 
# Select only the row for this film and Save it as Q1

#Index DataFrame 

fandango.set_index('FILM') 

Q1 = fandango.loc[fandango['FILM'] == 'Fantastic Four (2015)']  
Q1
Out[5]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
48 Fantastic Four (2015) 9 20 27 2.5 4.0 3.0 2.7 0.45 1.0 1.35 1.25 2.0 0.5 1.0 1.5 1.5 2.0 421 39838 6288 0.3

As you can see, the Fandango_Stars rating is 3.0 while if we look at the other ratings from the other review sites (looking at the columns with norm in the column name) you will notice the review scores being much lower.

Q2: Select the Fandango_Stars column as a Series and save it as Q2A. What is the minimum score in the Series Q2A? Save this minimum value as Q2B.

In [6]:
#Select the Fandango_Stars column as a Series and save it as Q2A

Q2A = fandango.Fandango_Stars                                   
Q2A                                                                   
Out[6]:
0      5.0
1      5.0
2      5.0
3      5.0
4      3.5
      ... 
141    4.0
142    3.5
143    3.5
144    3.5
145    3.5
Name: Fandango_Stars, Length: 146, dtype: float64
In [7]:
#  The minimum score in the Series Q2A             
# Save the result as Q2B

Q2B = fandango.Fandango_Stars.min()                                 
Q2B
Out[7]:
3.0

Q3: What is the maximum Fandango_Stars score in the fandango data? Save this as Q3.

In [8]:
# maximum Fandango_Stars score in the fandango data                    
#Save as Q3

Q3 = fandango.Fandango_Stars.max()                                 
Q3
Out[8]:
5.0

If you were to check the minimum and maximum ranges for the other review sites, you should notice that the range of scores for the Fandango star ratings is much more narrow.

Q4: What is the minimum score for RT_norm_round, Metacritic_norm_round, and IMDB_norm_round? Save these three values in that order as a list and call it Q4.

In [9]:
#Minimum score for RT_norm_round,Metacritic_norm_round, and IMDB_norm_round?
#Save these three values in that order as a list and call it Q4
#First we change pd df to np.array, get the min() and then change np.array to list

Q4 = fandango[['RT_norm_round','Metacritic_norm_round','IMDB_norm_round']].to_numpy()
Q4 = np.min(Q4, axis = 0).tolist()
Q4
Out[9]:
[0.5, 0.5, 2.0]

Q5: What is the maximum score for RT_norm_round, Metacritic_norm_round, and IMDB_norm_round? Save these three values in that order as a list and call it Q5.

In [10]:
#Maximum score for RT_norm_round, Metacritic_norm_round, and IMDB_norm_round
#First we change pd df to np.array, get the max() and then change np.array to list
Q5 = fandango[['RT_norm_round','Metacritic_norm_round','IMDB_norm_round']].to_numpy()
Q5 = np.max(Q5, axis = 0).tolist()
Q5
Out[10]:
[5.0, 4.5, 4.5]

As we saw above, the lowest rating for Fandango for a movie was three stars, while the rounded, normalized review stars for the other sites were much lower. The other sites also have a much wider range of values.

Run the cell below to view boxplots of the various site's star ratings. Notice the smaller range and higher minimum for the Fandango star ratings.

In [11]:
import matplotlib.pyplot as plt

data = fandango[['Fandango_Stars','RT_norm_round','Metacritic_norm_round','IMDB_norm_round']]
labels = ['Fandango', 'RT', 'Metacritic','IMDB']

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))

# rectangular box plot
bplot1 = ax.boxplot(data,
                     patch_artist=True,  # fill with color
                     labels=labels)  # will be used to label x-ticks

ax.set_title('Movie Stars Box Plots')

plt.show()

Q6: Using value_counts, what is the breakdown of counts for the Fandango_Stars column sorted by the index? Save this as Q6A.

What is the breakdown of relative frequencies (percentages) of the unique values sorted by the index and rounded to 3 decimal places? Check the Pandas value_counts documentation for help. Save this as Q6B.

In [12]:
#Breakdown of counts for the Fandango_Stars column sorted by index

Q6A = fandango['Fandango_Stars'].value_counts().sort_index(ascending=True)                      
Q6A
Out[12]:
3.0    12
3.5    27
4.0    41
4.5    55
5.0    11
Name: Fandango_Stars, dtype: int64
In [13]:
#Breakdown of relative frequencies (percentages) of the unique values 
#Sorted by the index and rounded to 3 decimal places

Q6B = fandango['Fandango_Stars'].value_counts(normalize=True).round(3).sort_index()
Q6B
Out[13]:
3.0    0.082
3.5    0.185
4.0    0.281
4.5    0.377
5.0    0.075
Name: Fandango_Stars, dtype: float64

Q7: Given the output for Q6B, what is the sum of the frequency of values for 4 stars or more? You may hard code this number and round to 2 decimals. Save this as Q7.

In [14]:
#The sum of the frequency of values for 4 stars or more
# We can hard code the number and round to 2 decimals.
# Save this as Q7

Q7 = 0.28 + 0.38 + 0.07
Q7
Out[14]:
0.73

To give you a comparison to your output above, the sum of frequency values for 4 stars or more for the other sites are as follows:

  • RT_norm_round = 0.419
  • Metacritic_norm_round = 0.274
  • IMDB_norm_round = 0.233

You should see that the output for Q7 is higher, which is one indication that Fandango rates their movies with higher overall scores.

In [15]:
# Use this, to check discrepancy between the actual ratings and stars shown from Fandango
# fandango['Fandango_Difference'].unique()

Q8: In the article, the author "normalized" the data by turning the sites with 0-100 ranges to 0-5 ranges. Create a Series called Q8 that takes the values in the RottenTomatoes column and normalizes them to 0-5 ranges. To double check your calculation, your Series should match exactly the RT_norm column.

For example: A score of 100 will equal a new score of 5.00, a score of 50 will equal a new score of 2.50, a score of 20 will equal a new score of 1, etc. You should be able to calculate this with one line of code.

In [16]:
#This checks the fandango description and T the max

# fandango.describe().transpose()['max']   
In [17]:
#Create a Series called Q8 that takes the values in the RottenTomatoes column and normalizes them to 0-5 ranges. 
#To double check your calculation, your Series should match exactly the RT_norm column.

Q8 = round(fandango['RottenTomatoes']/20,2)
print(Q8)
0      3.70
1      4.25
2      4.00
3      0.90
4      0.70
       ... 
141    4.35
142    4.85
143    4.85
144    5.00
145    4.35
Name: RottenTomatoes, Length: 146, dtype: float64
In [18]:
# double check your above code - This should output 0

Q8 = (fandango['RT_norm'] != Q8).sum()
print(Q8)
0

Q9:

  • First create a copy of the fandango DataFrame and call this Q9A.
  • Create a new column in the Q9A DataFrame called f_rt_norm that is the Fandango_Stars column minus the RT_norm_round column.
  • Select only the films where the RT_norm_round was higher (the f_rt_norm is negative) and save this as Q9B.
In [19]:
#Create a copy of the fandango DataFrame and call this Q9A
Q9A = fandango.copy()                           
Q9A
Out[19]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
0 Avengers: Age of Ultron (2015) 74 86 66 7.1 7.8 5.0 4.5 3.70 4.30 3.30 3.55 3.90 3.5 4.5 3.5 3.5 4.0 1330 271107 14846 0.5
1 Cinderella (2015) 85 80 67 7.5 7.1 5.0 4.5 4.25 4.00 3.35 3.75 3.55 4.5 4.0 3.5 4.0 3.5 249 65709 12640 0.5
2 Ant-Man (2015) 80 90 64 8.1 7.8 5.0 4.5 4.00 4.50 3.20 4.05 3.90 4.0 4.5 3.0 4.0 4.0 627 103660 12055 0.5
3 Do You Believe? (2015) 18 84 22 4.7 5.4 5.0 4.5 0.90 4.20 1.10 2.35 2.70 1.0 4.0 1.0 2.5 2.5 31 3136 1793 0.5
4 Hot Tub Time Machine 2 (2015) 14 28 29 3.4 5.1 3.5 3.0 0.70 1.40 1.45 1.70 2.55 0.5 1.5 1.5 1.5 2.5 88 19560 1021 0.5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
141 Mr. Holmes (2015) 87 78 67 7.9 7.4 4.0 4.0 4.35 3.90 3.35 3.95 3.70 4.5 4.0 3.5 4.0 3.5 33 7367 1348 0.0
142 '71 (2015) 97 82 83 7.5 7.2 3.5 3.5 4.85 4.10 4.15 3.75 3.60 5.0 4.0 4.0 4.0 3.5 60 24116 192 0.0
143 Two Days, One Night (2014) 97 78 89 8.8 7.4 3.5 3.5 4.85 3.90 4.45 4.40 3.70 5.0 4.0 4.5 4.5 3.5 123 24345 118 0.0
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.00 4.05 4.50 3.65 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0
145 Kumiko, The Treasure Hunter (2015) 87 63 68 6.4 6.7 3.5 3.5 4.35 3.15 3.40 3.20 3.35 4.5 3.0 3.5 3.0 3.5 19 5289 41 0.0

146 rows × 22 columns

In [20]:
#Create a new column in the Q9A DataFrame called f_rt_norm 
#that is the Fandango_Stars column minus the RT_norm_round column

Q9A['f_rt_norm'] =  Q9A['Fandango_Stars'] - Q9A['RT_norm_round']                                                                   ### ENTER CODE HERE ###
Q9A
Out[20]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference f_rt_norm
0 Avengers: Age of Ultron (2015) 74 86 66 7.1 7.8 5.0 4.5 3.70 4.30 3.30 3.55 3.90 3.5 4.5 3.5 3.5 4.0 1330 271107 14846 0.5 1.5
1 Cinderella (2015) 85 80 67 7.5 7.1 5.0 4.5 4.25 4.00 3.35 3.75 3.55 4.5 4.0 3.5 4.0 3.5 249 65709 12640 0.5 0.5
2 Ant-Man (2015) 80 90 64 8.1 7.8 5.0 4.5 4.00 4.50 3.20 4.05 3.90 4.0 4.5 3.0 4.0 4.0 627 103660 12055 0.5 1.0
3 Do You Believe? (2015) 18 84 22 4.7 5.4 5.0 4.5 0.90 4.20 1.10 2.35 2.70 1.0 4.0 1.0 2.5 2.5 31 3136 1793 0.5 4.0
4 Hot Tub Time Machine 2 (2015) 14 28 29 3.4 5.1 3.5 3.0 0.70 1.40 1.45 1.70 2.55 0.5 1.5 1.5 1.5 2.5 88 19560 1021 0.5 3.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
141 Mr. Holmes (2015) 87 78 67 7.9 7.4 4.0 4.0 4.35 3.90 3.35 3.95 3.70 4.5 4.0 3.5 4.0 3.5 33 7367 1348 0.0 -0.5
142 '71 (2015) 97 82 83 7.5 7.2 3.5 3.5 4.85 4.10 4.15 3.75 3.60 5.0 4.0 4.0 4.0 3.5 60 24116 192 0.0 -1.5
143 Two Days, One Night (2014) 97 78 89 8.8 7.4 3.5 3.5 4.85 3.90 4.45 4.40 3.70 5.0 4.0 4.5 4.5 3.5 123 24345 118 0.0 -1.5
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.00 4.05 4.50 3.65 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0 -1.5
145 Kumiko, The Treasure Hunter (2015) 87 63 68 6.4 6.7 3.5 3.5 4.35 3.15 3.40 3.20 3.35 4.5 3.0 3.5 3.0 3.5 19 5289 41 0.0 -1.0

146 rows × 23 columns

In [21]:
# Select only the films where the `RT_norm_round` was higher 
# (the `f_rt_norm` is negative) and save this as `Q9B`

Q9B = Q9A[Q9A['f_rt_norm'] < 0] 
Q9B
Out[21]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference f_rt_norm
7 Top Five (2014) 86 64 81 6.8 6.5 4.0 3.5 4.30 3.20 4.05 3.40 3.25 4.5 3.0 4.0 3.5 3.5 124 16876 3223 0.5 -0.5
8 Shaun the Sheep Movie (2015) 99 82 81 8.8 7.4 4.5 4.0 4.95 4.10 4.05 4.40 3.70 5.0 4.0 4.0 4.5 3.5 62 12227 896 0.5 -0.5
12 Leviathan (2014) 99 79 92 7.2 7.7 4.0 3.5 4.95 3.95 4.60 3.60 3.85 5.0 4.0 4.5 3.5 4.0 145 22521 64 0.5 -1.0
28 Wild Tales (2014) 96 92 77 8.8 8.2 4.5 4.1 4.80 4.60 3.85 4.40 4.10 5.0 4.5 4.0 4.5 4.0 107 50285 235 0.4 -0.5
30 Red Army (2015) 96 86 82 7.4 7.7 4.5 4.1 4.80 4.30 4.10 3.70 3.85 5.0 4.5 4.0 3.5 4.0 11 2275 54 0.4 -0.5
40 I'll See You In My Dreams (2015) 94 70 75 6.9 6.9 4.0 3.6 4.70 3.50 3.75 3.45 3.45 4.5 3.5 4.0 3.5 3.5 14 1151 281 0.4 -0.5
41 Timbuktu (2015) 99 78 91 6.9 7.2 4.0 3.6 4.95 3.90 4.55 3.45 3.60 5.0 4.0 4.5 3.5 3.5 37 6246 74 0.4 -1.0
42 About Elly (2015) 97 86 87 9.6 8.2 4.0 3.6 4.85 4.30 4.35 4.80 4.10 5.0 4.5 4.5 5.0 4.0 23 20659 43 0.4 -1.0
43 The Diary of a Teenage Girl (2015) 95 81 87 6.3 7.0 4.0 3.6 4.75 4.05 4.35 3.15 3.50 5.0 4.0 4.5 3.0 3.5 18 1107 38 0.4 -1.0
65 Birdman (2014) 92 78 88 8.0 7.9 4.0 3.7 4.60 3.90 4.40 4.00 3.95 4.5 4.0 4.5 4.0 4.0 1171 303505 4194 0.3 -0.5
66 The Gift (2015) 93 79 77 8.3 7.6 4.0 3.7 4.65 3.95 3.85 4.15 3.80 4.5 4.0 4.0 4.0 4.0 121 10891 2680 0.3 -0.5
69 Mr. Turner (2014) 98 56 94 6.6 6.9 3.5 3.2 4.90 2.80 4.70 3.30 3.45 5.0 3.0 4.5 3.5 3.5 98 13296 290 0.3 -1.5
70 Seymour: An Introduction (2015) 100 87 83 6.0 7.7 4.5 4.2 5.00 4.35 4.15 3.00 3.85 5.0 4.5 4.0 3.0 4.0 4 243 41 0.3 -0.5
88 Mad Max: Fury Road (2015) 97 88 89 8.7 8.3 4.5 4.3 4.85 4.40 4.45 4.35 4.15 5.0 4.5 4.5 4.5 4.0 2375 292023 10509 0.2 -0.5
90 The SpongeBob Movie: Sponge Out of Water (2015) 78 55 62 6.5 6.1 3.5 3.3 3.90 2.75 3.10 3.25 3.05 4.0 3.0 3.0 3.5 3.0 196 26046 4493 0.2 -0.5
91 Paddington (2015) 98 81 77 8.2 7.2 4.5 4.3 4.90 4.05 3.85 4.10 3.60 5.0 4.0 4.0 4.0 3.5 149 38593 4045 0.2 -0.5
93 What We Do in the Shadows (2015) 96 86 75 8.3 7.6 4.5 4.3 4.80 4.30 3.75 4.15 3.80 5.0 4.5 4.0 4.0 4.0 69 39561 259 0.2 -0.5
94 The Overnight (2015) 82 65 65 8.6 6.9 3.5 3.3 4.10 3.25 3.25 4.30 3.45 4.0 3.5 3.5 4.5 3.5 13 867 110 0.2 -0.5
95 The Salt of the Earth (2015) 96 90 83 7.8 8.4 4.5 4.3 4.80 4.50 4.15 3.90 4.20 5.0 4.5 4.0 4.0 4.0 13 6605 83 0.2 -0.5
96 Song of the Sea (2014) 99 92 86 8.2 8.2 4.5 4.3 4.95 4.60 4.30 4.10 4.10 5.0 4.5 4.5 4.0 4.0 62 14067 66 0.2 -0.5
112 It Follows (2015) 96 65 83 7.5 6.9 3.0 2.9 4.80 3.25 4.15 3.75 3.45 5.0 3.5 4.0 4.0 3.5 551 64656 2097 0.1 -2.0
113 Inherent Vice (2014) 73 52 81 7.4 6.7 3.0 2.9 3.65 2.60 4.05 3.70 3.35 3.5 2.5 4.0 3.5 3.5 286 44711 1078 0.1 -0.5
114 A Most Violent Year (2014) 90 69 79 7.0 7.1 3.5 3.4 4.50 3.45 3.95 3.50 3.55 4.5 3.5 4.0 3.5 3.5 133 32166 675 0.1 -1.0
115 While We're Young (2015) 83 52 76 6.7 6.4 3.0 2.9 4.15 2.60 3.80 3.35 3.20 4.0 2.5 4.0 3.5 3.0 65 17647 449 0.1 -1.0
116 Clouds of Sils Maria (2015) 89 67 78 7.1 6.8 3.5 3.4 4.45 3.35 3.90 3.55 3.40 4.5 3.5 4.0 3.5 3.5 36 11392 162 0.1 -1.0
119 Phoenix (2015) 99 81 91 8.0 7.2 3.5 3.4 4.95 4.05 4.55 4.00 3.60 5.0 4.0 4.5 4.0 3.5 21 3687 70 0.1 -1.5
120 The Wolfpack (2015) 84 73 75 7.0 7.1 3.5 3.4 4.20 3.65 3.75 3.50 3.55 4.0 3.5 4.0 3.5 3.5 8 1488 66 0.1 -0.5
122 Tangerine (2015) 95 86 86 7.3 7.4 4.0 3.9 4.75 4.30 4.30 3.65 3.70 5.0 4.5 4.5 3.5 3.5 14 696 36 0.1 -1.0
129 Amy (2015) 97 91 85 8.8 8.0 4.5 4.4 4.85 4.55 4.25 4.40 4.00 5.0 4.5 4.5 4.5 4.0 60 5630 729 0.1 -0.5
140 Inside Out (2015) 98 90 94 8.9 8.6 4.5 4.5 4.90 4.50 4.70 4.45 4.30 5.0 4.5 4.5 4.5 4.5 807 96252 15749 0.0 -0.5
141 Mr. Holmes (2015) 87 78 67 7.9 7.4 4.0 4.0 4.35 3.90 3.35 3.95 3.70 4.5 4.0 3.5 4.0 3.5 33 7367 1348 0.0 -0.5
142 '71 (2015) 97 82 83 7.5 7.2 3.5 3.5 4.85 4.10 4.15 3.75 3.60 5.0 4.0 4.0 4.0 3.5 60 24116 192 0.0 -1.5
143 Two Days, One Night (2014) 97 78 89 8.8 7.4 3.5 3.5 4.85 3.90 4.45 4.40 3.70 5.0 4.0 4.5 4.5 3.5 123 24345 118 0.0 -1.5
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.00 4.05 4.50 3.65 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0 -1.5
145 Kumiko, The Treasure Hunter (2015) 87 63 68 6.4 6.7 3.5 3.5 4.35 3.15 3.40 3.20 3.35 4.5 3.0 3.5 3.0 3.5 19 5289 41 0.0 -1.0

Q10 & Q11:

  • Perform the same action as above creating two new columns in Q9A by subtracting the Metacritic_norm_round and IMDB_norm_round from Fandango_Stars and calling the columns f_mc_norm and f_imdb_norm respectively and in that order.
  • Select the rows where f_mc_norm is negative and save this as Q10.
  • Select the rows where f_imdb_norm is negative and save this as Q11.
In [22]:
#creating two new columns in Q9A by subtracting: 
#The Metacritic_norm_round and IMDB_norm_round from Fandango_Stars and 
#Call the columns f_mc_norm and f_imdb_norm respectively and in that order

Q9A['f_mc_norm'] =  Q9A['Fandango_Stars'] - Q9A['Metacritic_norm_round']  
Q9A['f_imdb_norm'] =  Q9A['Fandango_Stars'] - Q9A['IMDB_norm_round']
Q9A
Out[22]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference f_rt_norm f_mc_norm f_imdb_norm
0 Avengers: Age of Ultron (2015) 74 86 66 7.1 7.8 5.0 4.5 3.70 4.30 3.30 3.55 3.90 3.5 4.5 3.5 3.5 4.0 1330 271107 14846 0.5 1.5 1.5 1.0
1 Cinderella (2015) 85 80 67 7.5 7.1 5.0 4.5 4.25 4.00 3.35 3.75 3.55 4.5 4.0 3.5 4.0 3.5 249 65709 12640 0.5 0.5 1.5 1.5
2 Ant-Man (2015) 80 90 64 8.1 7.8 5.0 4.5 4.00 4.50 3.20 4.05 3.90 4.0 4.5 3.0 4.0 4.0 627 103660 12055 0.5 1.0 2.0 1.0
3 Do You Believe? (2015) 18 84 22 4.7 5.4 5.0 4.5 0.90 4.20 1.10 2.35 2.70 1.0 4.0 1.0 2.5 2.5 31 3136 1793 0.5 4.0 4.0 2.5
4 Hot Tub Time Machine 2 (2015) 14 28 29 3.4 5.1 3.5 3.0 0.70 1.40 1.45 1.70 2.55 0.5 1.5 1.5 1.5 2.5 88 19560 1021 0.5 3.0 2.0 1.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
141 Mr. Holmes (2015) 87 78 67 7.9 7.4 4.0 4.0 4.35 3.90 3.35 3.95 3.70 4.5 4.0 3.5 4.0 3.5 33 7367 1348 0.0 -0.5 0.5 0.5
142 '71 (2015) 97 82 83 7.5 7.2 3.5 3.5 4.85 4.10 4.15 3.75 3.60 5.0 4.0 4.0 4.0 3.5 60 24116 192 0.0 -1.5 -0.5 0.0
143 Two Days, One Night (2014) 97 78 89 8.8 7.4 3.5 3.5 4.85 3.90 4.45 4.40 3.70 5.0 4.0 4.5 4.5 3.5 123 24345 118 0.0 -1.5 -1.0 0.0
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.00 4.05 4.50 3.65 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0 -1.5 -1.0 -0.5
145 Kumiko, The Treasure Hunter (2015) 87 63 68 6.4 6.7 3.5 3.5 4.35 3.15 3.40 3.20 3.35 4.5 3.0 3.5 3.0 3.5 19 5289 41 0.0 -1.0 0.0 0.0

146 rows × 25 columns

In [23]:
#Select the rows where f_mc_norm is negative 
#And save this as Q10

Q10 = Q9A[Q9A['f_mc_norm'] < 0]
Q10
Out[23]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference f_rt_norm f_mc_norm f_imdb_norm
12 Leviathan (2014) 99 79 92 7.2 7.7 4.0 3.5 4.95 3.95 4.60 3.60 3.85 5.0 4.0 4.5 3.5 4.0 145 22521 64 0.5 -1.0 -0.5 0.0
41 Timbuktu (2015) 99 78 91 6.9 7.2 4.0 3.6 4.95 3.90 4.55 3.45 3.60 5.0 4.0 4.5 3.5 3.5 37 6246 74 0.4 -1.0 -0.5 0.5
42 About Elly (2015) 97 86 87 9.6 8.2 4.0 3.6 4.85 4.30 4.35 4.80 4.10 5.0 4.5 4.5 5.0 4.0 23 20659 43 0.4 -1.0 -0.5 0.0
43 The Diary of a Teenage Girl (2015) 95 81 87 6.3 7.0 4.0 3.6 4.75 4.05 4.35 3.15 3.50 5.0 4.0 4.5 3.0 3.5 18 1107 38 0.4 -1.0 -0.5 0.5
65 Birdman (2014) 92 78 88 8.0 7.9 4.0 3.7 4.60 3.90 4.40 4.00 3.95 4.5 4.0 4.5 4.0 4.0 1171 303505 4194 0.3 -0.5 -0.5 0.0
69 Mr. Turner (2014) 98 56 94 6.6 6.9 3.5 3.2 4.90 2.80 4.70 3.30 3.45 5.0 3.0 4.5 3.5 3.5 98 13296 290 0.3 -1.5 -1.0 0.0
112 It Follows (2015) 96 65 83 7.5 6.9 3.0 2.9 4.80 3.25 4.15 3.75 3.45 5.0 3.5 4.0 4.0 3.5 551 64656 2097 0.1 -2.0 -1.0 -0.5
113 Inherent Vice (2014) 73 52 81 7.4 6.7 3.0 2.9 3.65 2.60 4.05 3.70 3.35 3.5 2.5 4.0 3.5 3.5 286 44711 1078 0.1 -0.5 -1.0 -0.5
114 A Most Violent Year (2014) 90 69 79 7.0 7.1 3.5 3.4 4.50 3.45 3.95 3.50 3.55 4.5 3.5 4.0 3.5 3.5 133 32166 675 0.1 -1.0 -0.5 0.0
115 While We're Young (2015) 83 52 76 6.7 6.4 3.0 2.9 4.15 2.60 3.80 3.35 3.20 4.0 2.5 4.0 3.5 3.0 65 17647 449 0.1 -1.0 -1.0 0.0
116 Clouds of Sils Maria (2015) 89 67 78 7.1 6.8 3.5 3.4 4.45 3.35 3.90 3.55 3.40 4.5 3.5 4.0 3.5 3.5 36 11392 162 0.1 -1.0 -0.5 0.0
119 Phoenix (2015) 99 81 91 8.0 7.2 3.5 3.4 4.95 4.05 4.55 4.00 3.60 5.0 4.0 4.5 4.0 3.5 21 3687 70 0.1 -1.5 -1.0 0.0
120 The Wolfpack (2015) 84 73 75 7.0 7.1 3.5 3.4 4.20 3.65 3.75 3.50 3.55 4.0 3.5 4.0 3.5 3.5 8 1488 66 0.1 -0.5 -0.5 0.0
122 Tangerine (2015) 95 86 86 7.3 7.4 4.0 3.9 4.75 4.30 4.30 3.65 3.70 5.0 4.5 4.5 3.5 3.5 14 696 36 0.1 -1.0 -0.5 0.5
142 '71 (2015) 97 82 83 7.5 7.2 3.5 3.5 4.85 4.10 4.15 3.75 3.60 5.0 4.0 4.0 4.0 3.5 60 24116 192 0.0 -1.5 -0.5 0.0
143 Two Days, One Night (2014) 97 78 89 8.8 7.4 3.5 3.5 4.85 3.90 4.45 4.40 3.70 5.0 4.0 4.5 4.5 3.5 123 24345 118 0.0 -1.5 -1.0 0.0
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.00 4.05 4.50 3.65 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0 -1.5 -1.0 -0.5
In [24]:
#Select the rows where f_imdb_norm is negative 
#And save this as Q11
Q11 = Q9A[Q9A['f_imdb_norm'] < 0]
Q11
Out[24]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference f_rt_norm f_mc_norm f_imdb_norm
112 It Follows (2015) 96 65 83 7.5 6.9 3.0 2.9 4.80 3.25 4.15 3.75 3.45 5.0 3.5 4.0 4.0 3.5 551 64656 2097 0.1 -2.0 -1.0 -0.5
113 Inherent Vice (2014) 73 52 81 7.4 6.7 3.0 2.9 3.65 2.60 4.05 3.70 3.35 3.5 2.5 4.0 3.5 3.5 286 44711 1078 0.1 -0.5 -1.0 -0.5
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.00 4.05 4.50 3.65 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0 -1.5 -1.0 -0.5

Q12: What are the means of the f_rt_norm, f_mc_norm, and f_imdb_norm columns rounded to 3 decimal places. Put these three averages into a list called Q12.

In [25]:
#Compute the means of the f_rt_norm, f_mc_norm, and f_imdb_norm columns rounded to 3 decimal places. 
# Put these three averages into a list called Q12.
#First we change the pd df to np.array, get the means and then change np.array to list

Q12 = Q9A[['f_rt_norm', 'f_mc_norm','f_imdb_norm']].to_numpy() #Change to numpy
Q12 = np.mean(Q12, axis = 0).tolist()    #convert to list
Q12 = [round(elem,3) for elem in Q12 ]   #round to 3 deciamal places
Q12
Out[25]:
[1.024, 1.116, 0.709]

Q13: What are the median values of the f_rt_norm, f_mc_norm, and f_imdb_norm columns. Put these three median values into a list called Q13.

In [26]:
#Compute the median values of the f_rt_norm, f_mc_norm, and f_imdb_norm columns 
#Put the three median values into a list called Q13
#First we change the pd df to np.array, get the means 
# We then finally change np.array into list

Q13 = Q9A[['f_rt_norm', 'f_mc_norm','f_imdb_norm']].to_numpy() #Change to numpy
Q13 = np.median(Q13, axis = 0).tolist()    #convert to list
Q13
Out[26]:
[1.0, 1.0, 0.5]

Q14: The article mentions the movie Avengers: Age of Ultron (2015). Select this film in the fandango data and save this as Q14.

In [27]:
# Select the movie 'Avengers: Age of Ultron (2015)' in the fandango data 
# And save this as Q14

Q14 = fandango[fandango['FILM'] == 'Avengers: Age of Ultron (2015)']
Q14
Out[27]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
0 Avengers: Age of Ultron (2015) 74 86 66 7.1 7.8 5.0 4.5 3.7 4.3 3.3 3.55 3.9 3.5 4.5 3.5 3.5 4.0 1330 271107 14846 0.5

Q15: Select only the FILM, Fandango_Stars, and Fandango_Ratingvalue columns from Q14 and save it as Q15.

In [28]:
# Select only the `FILM`, `Fandango_Stars`, and `Fandango_Ratingvalue` columns from `Q14` 
# And save it as `Q15`

Q15 = Q14[['FILM','Fandango_Stars','Fandango_Ratingvalue']]
Q15
Out[28]:
FILM Fandango_Stars Fandango_Ratingvalue
0 Avengers: Age of Ultron (2015) 5.0 4.5

As was mentioned in the article, you can see that this movie gained an entire half star between the actual rating score and the stars score.

Q16: Index the fandango DataFrame and select all movies with an IMDB score of 8 or higher. Save this DataFrame as Q16.

In [29]:
# Index the fandango DataFrame

# fandango.set_index('FILM')  ---Already indexed on Q1

# Select all movies with an IMDB score of 8 or higher  
# Save the DataFrame as Q16 

fandango_IMDB_scores  = fandango[fandango['IMDB']>= 8] 
Q16 = fandango_IMDB_scores
Q16
Out[29]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
14 The Imitation Game (2014) 90 92 73 8.2 8.1 5.0 4.6 4.50 4.60 3.65 4.10 4.05 4.5 4.5 3.5 4.0 4.0 566 334164 8055 0.4
28 Wild Tales (2014) 96 92 77 8.8 8.2 4.5 4.1 4.80 4.60 3.85 4.40 4.10 5.0 4.5 4.0 4.5 4.0 107 50285 235 0.4
42 About Elly (2015) 97 86 87 9.6 8.2 4.0 3.6 4.85 4.30 4.35 4.80 4.10 5.0 4.5 4.5 5.0 4.0 23 20659 43 0.4
76 Straight Outta Compton (2015) 90 94 72 7.3 8.4 5.0 4.8 4.50 4.70 3.60 3.65 4.20 4.5 4.5 3.5 3.5 4.0 90 15982 8096 0.2
86 Me and Earl and The Dying Girl (2015) 81 89 74 8.4 8.2 4.5 4.3 4.05 4.45 3.70 4.20 4.10 4.0 4.5 3.5 4.0 4.0 41 5269 624 0.2
88 Mad Max: Fury Road (2015) 97 88 89 8.7 8.3 4.5 4.3 4.85 4.40 4.45 4.35 4.15 5.0 4.5 4.5 4.5 4.0 2375 292023 10509 0.2
95 The Salt of the Earth (2015) 96 90 83 7.8 8.4 4.5 4.3 4.80 4.50 4.15 3.90 4.20 5.0 4.5 4.0 4.0 4.0 13 6605 83 0.2
96 Song of the Sea (2014) 99 92 86 8.2 8.2 4.5 4.3 4.95 4.60 4.30 4.10 4.10 5.0 4.5 4.5 4.0 4.0 62 14067 66 0.2
129 Amy (2015) 97 91 85 8.8 8.0 4.5 4.4 4.85 4.55 4.25 4.40 4.00 5.0 4.5 4.5 4.5 4.0 60 5630 729 0.1
140 Inside Out (2015) 98 90 94 8.9 8.6 4.5 4.5 4.90 4.50 4.70 4.45 4.30 5.0 4.5 4.5 4.5 4.5 807 96252 15749 0.0

Q17: Index the fandango DataFrame and select all movies with a RottenTomatoes score of 100. Save this DataFrame as Q17.

In [30]:
#Index the fandango DataFrame 

# fandango = fandango.set_index('FILM')    --fandango df Already indexed on Q1

# Select all movies with a RottenTomatoes score of 100 
# Save this DataFrame as Q17

fandango_RT_scores_100 = fandango[fandango['RottenTomatoes']==100]
Q17 = fandango_RT_scores_100
Q17
Out[30]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
70 Seymour: An Introduction (2015) 100 87 83 6.0 7.7 4.5 4.2 5.0 4.35 4.15 3.00 3.85 5.0 4.5 4.0 3.0 4.0 4 243 41 0.3
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.0 4.05 4.50 3.65 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0

Q18: Index the fandango DataFrame and select all movies with a Metacritic score less than 20. Save this DataFrame as Q18.

In [31]:
#Index the fandango DataFrame 

# fandango = fandango.set_index('FILM')  --fandango df Already indexed on Q1
      
#Select all movies with a Metacritic score less than 20
# Save the DataFrame as Q18

fandango_Metacritic_scores = fandango[fandango['Metacritic'] < 20]
Q18 = fandango_Metacritic_scores
Q18
Out[31]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
133 Paul Blart: Mall Cop 2 (2015) 5 36 13 2.4 4.3 3.5 3.5 0.25 1.8 0.65 1.2 2.15 0.5 2.0 0.5 1.0 2.0 211 15004 3054 0.0

Q19: Index the fandango DataFrame and select all movies with a RottenTomatoes_User greater than 85, Metacritic_User score greater than 8.5, and an IMDB score greater than 8.5. Save this DataFrame as Q19.

In [32]:
### ////Note: will re do Q19 ////
    
#Index the fandango DataFrame    

# fandango = fandango.set_index('FILM')  --fandango df Already indexed on Q1
      
In [33]:
# 1/ Select all movies with a RottenTomatoes_User greater than 85,

Q19 = fandango[(fandango['RottenTomatoes_User'] > 85) & (fandango['IMDB'] > 8.5) & (fandango['Metacritic_User'] > 8.5)]
Q19  
Out[33]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
140 Inside Out (2015) 98 90 94 8.9 8.6 4.5 4.5 4.9 4.5 4.7 4.45 4.3 5.0 4.5 4.5 4.5 4.5 807 96252 15749 0.0

Q20: Sort the fandango DataFrame by the number of Fandango votes (in descending order) and save this new DataFrame as fandango_sorted_value. Select only the films with more than 30,000 votes and save as Q20 in descending order by number of votes.

In [34]:
#Index the fandango DataFrame 

# fandango = fandango.set_index('FILM') --fandango df Already indexed on Q1

fandango_sorted_value = fandango.sort_values(by='Fandango_votes', ascending=False)
Q20 = fandango_sorted_value[fandango_sorted_value['Fandango_votes'] > 30000] 
print(Q20)
                            FILM  RottenTomatoes  RottenTomatoes_User  \
97   Fifty Shades of Grey (2015)              25                   42   
130        Jurassic World (2015)              71                   81   
72        American Sniper (2015)              72                   85   
73              Furious 7 (2015)              81                   84   

     Metacritic  Metacritic_User  IMDB  Fandango_Stars  Fandango_Ratingvalue  \
97           46              3.2   4.2             4.0                   3.9   
130          59              7.0   7.3             4.5                   4.5   
72           72              6.6   7.4             5.0                   4.8   
73           67              6.8   7.4             5.0                   4.8   

     RT_norm  RT_user_norm  Metacritic_norm  Metacritic_user_nom  IMDB_norm  \
97      1.25          2.10             2.30                  1.6       2.10   
130     3.55          4.05             2.95                  3.5       3.65   
72      3.60          4.25             3.60                  3.3       3.70   
73      4.05          4.20             3.35                  3.4       3.70   

     RT_norm_round  RT_user_norm_round  Metacritic_norm_round  \
97             1.5                 2.0                    2.5   
130            3.5                 4.0                    3.0   
72             3.5                 4.5                    3.5   
73             4.0                 4.0                    3.5   

     Metacritic_user_norm_round  IMDB_norm_round  Metacritic_user_vote_count  \
97                          1.5              2.0                         778   
130                         3.5              3.5                        1281   
72                          3.5              3.5                         850   
73                          3.5              3.5                         764   

     IMDB_user_vote_count  Fandango_votes  Fandango_Difference  
97                 179506           34846                  0.1  
130                241807           34390                  0.0  
72                 251856           34085                  0.2  
73                 207211           33538                  0.2  

Q21: Using the above Q20 DataFrame, select only the films that have a Fandango_Stars rating of 5 and save this DataFrame as Q21.

In [35]:
#Use the above Q20 DataFrame 
#Select only the films that have a Fandango_Stars rating of 5 
#And save this DataFrame as Q21

Q21 = Q20[Q20['Fandango_Stars'] == 5]
Q21
Out[35]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
72 American Sniper (2015) 72 85 72 6.6 7.4 5.0 4.8 3.60 4.25 3.60 3.3 3.7 3.5 4.5 3.5 3.5 3.5 850 251856 34085 0.2
73 Furious 7 (2015) 81 84 67 6.8 7.4 5.0 4.8 4.05 4.20 3.35 3.4 3.7 4.0 4.0 3.5 3.5 3.5 764 207211 33538 0.2

Q22: The article from 538 goes into detail about the rounding problem with the Fandango star scores. Let's see if we can do better.

  • First create a copy of the fandango DataFrame and call it Q22A.
  • Create a new column called Fandango_Stars_Revised that rounds the Fandango_Ratingvalue column to the nearest half a star. For example, 3.2 would round to 3.0, 3.3 would round to 3.5, 3.7 would round to 3.5, 3.8 would round to 4.0, etc.
  • Hint: Check out this stackoverflow question for help
  • Select only the FILM, Fandango_Ratingvalue, and Fandango_Stars_Revised columns and save this DataFrame as Q22B.
  • Code Check: Select all movies with a Fandango_Ratingvalue of 3.7 and make sure that you rounded this to 3.5. Select all movies with a Fandango_Ratingvalue of 3.8 and make sure that you rounded this to 4.0.
In [36]:
#1/Create a copy of the fandango DataFrame and call it Q22A

fandango.set_index('FILM')   #Index fandango df

Q22A = fandango.copy()      #Make copy and assign Q22A   

#2/ Create a new column called Fandango_Stars_Revised that rounds the Fandango_Ratingvalue column to the nearest half a star
    # For example, 3.2 would round to 3.0, 3.3 would round to 3.5, 3.7 would round to 3.5, 3.8 would round to 4.0, etc.
    

Q22A['Fandango_Stars_Revised'] = round(Q22A['Fandango_Ratingvalue'] * 2) / 2

print(Q22A)
                                          FILM  RottenTomatoes  \
0               Avengers: Age of Ultron (2015)              74   
1                            Cinderella (2015)              85   
2                               Ant-Man (2015)              80   
3                       Do You Believe? (2015)              18   
4                Hot Tub Time Machine 2 (2015)              14   
..                                         ...             ...   
141                          Mr. Holmes (2015)              87   
142                                 '71 (2015)              97   
143                 Two Days, One Night (2014)              97   
144  Gett: The Trial of Viviane Amsalem (2015)             100   
145         Kumiko, The Treasure Hunter (2015)              87   

     RottenTomatoes_User  Metacritic  Metacritic_User  IMDB  Fandango_Stars  \
0                     86          66              7.1   7.8             5.0   
1                     80          67              7.5   7.1             5.0   
2                     90          64              8.1   7.8             5.0   
3                     84          22              4.7   5.4             5.0   
4                     28          29              3.4   5.1             3.5   
..                   ...         ...              ...   ...             ...   
141                   78          67              7.9   7.4             4.0   
142                   82          83              7.5   7.2             3.5   
143                   78          89              8.8   7.4             3.5   
144                   81          90              7.3   7.8             3.5   
145                   63          68              6.4   6.7             3.5   

     Fandango_Ratingvalue  RT_norm  RT_user_norm  Metacritic_norm  \
0                     4.5     3.70          4.30             3.30   
1                     4.5     4.25          4.00             3.35   
2                     4.5     4.00          4.50             3.20   
3                     4.5     0.90          4.20             1.10   
4                     3.0     0.70          1.40             1.45   
..                    ...      ...           ...              ...   
141                   4.0     4.35          3.90             3.35   
142                   3.5     4.85          4.10             4.15   
143                   3.5     4.85          3.90             4.45   
144                   3.5     5.00          4.05             4.50   
145                   3.5     4.35          3.15             3.40   

     Metacritic_user_nom  IMDB_norm  RT_norm_round  RT_user_norm_round  \
0                   3.55       3.90            3.5                 4.5   
1                   3.75       3.55            4.5                 4.0   
2                   4.05       3.90            4.0                 4.5   
3                   2.35       2.70            1.0                 4.0   
4                   1.70       2.55            0.5                 1.5   
..                   ...        ...            ...                 ...   
141                 3.95       3.70            4.5                 4.0   
142                 3.75       3.60            5.0                 4.0   
143                 4.40       3.70            5.0                 4.0   
144                 3.65       3.90            5.0                 4.0   
145                 3.20       3.35            4.5                 3.0   

     Metacritic_norm_round  Metacritic_user_norm_round  IMDB_norm_round  \
0                      3.5                         3.5              4.0   
1                      3.5                         4.0              3.5   
2                      3.0                         4.0              4.0   
3                      1.0                         2.5              2.5   
4                      1.5                         1.5              2.5   
..                     ...                         ...              ...   
141                    3.5                         4.0              3.5   
142                    4.0                         4.0              3.5   
143                    4.5                         4.5              3.5   
144                    4.5                         3.5              4.0   
145                    3.5                         3.0              3.5   

     Metacritic_user_vote_count  IMDB_user_vote_count  Fandango_votes  \
0                          1330                271107           14846   
1                           249                 65709           12640   
2                           627                103660           12055   
3                            31                  3136            1793   
4                            88                 19560            1021   
..                          ...                   ...             ...   
141                          33                  7367            1348   
142                          60                 24116             192   
143                         123                 24345             118   
144                          19                  1955              59   
145                          19                  5289              41   

     Fandango_Difference  Fandango_Stars_Revised  
0                    0.5                     4.5  
1                    0.5                     4.5  
2                    0.5                     4.5  
3                    0.5                     4.5  
4                    0.5                     3.0  
..                   ...                     ...  
141                  0.0                     4.0  
142                  0.0                     3.5  
143                  0.0                     3.5  
144                  0.0                     3.5  
145                  0.0                     3.5  

[146 rows x 23 columns]
In [37]:
#Select only the FILM, Fandango_Ratingvalue, and Fandango_Stars_Revised columns 
#Save this DataFrame as Q22B /or as Q22B_1

Q22B = Q22A[['FILM','Fandango_Ratingvalue','Fandango_Stars_Revised']]
print(Q22B)
                                          FILM  Fandango_Ratingvalue  \
0               Avengers: Age of Ultron (2015)                   4.5   
1                            Cinderella (2015)                   4.5   
2                               Ant-Man (2015)                   4.5   
3                       Do You Believe? (2015)                   4.5   
4                Hot Tub Time Machine 2 (2015)                   3.0   
..                                         ...                   ...   
141                          Mr. Holmes (2015)                   4.0   
142                                 '71 (2015)                   3.5   
143                 Two Days, One Night (2014)                   3.5   
144  Gett: The Trial of Viviane Amsalem (2015)                   3.5   
145         Kumiko, The Treasure Hunter (2015)                   3.5   

     Fandango_Stars_Revised  
0                       4.5  
1                       4.5  
2                       4.5  
3                       4.5  
4                       3.0  
..                      ...  
141                     4.0  
142                     3.5  
143                     3.5  
144                     3.5  
145                     3.5  

[146 rows x 3 columns]
In [38]:
#Select all movies with a Fandango_Rating value of 3.7 
# Make sure that you rounded this to 3.5 
#Save as Q22B_2

# Q22A['Fandango_Ratingvalue'] = 3.7                  #use this to set/re-set rounded values

Q22B_2 = Q22A[Q22A['Fandango_Ratingvalue'] == 3.7]
Q22B_2['Fandango_Stars_Revised'] = 3.5                #set 3.7 to 3.5
Q22B_2 = Q22B_2[['FILM','Fandango_Ratingvalue','Fandango_Stars_Revised']]
print(Q22B_2)
                          FILM  Fandango_Ratingvalue  Fandango_Stars_Revised
45         Tomorrowland (2015)                   3.7                     3.5
53          Hot Pursuit (2015)                   3.7                     3.5
56      Project Almanac (2015)                   3.7                     3.5
57  Ricki and the Flash (2015)                   3.7                     3.5
61       American Ultra (2015)                   3.7                     3.5
63             Child 44 (2015)                   3.7                     3.5
64          Dark Places (2015)                   3.7                     3.5
65              Birdman (2014)                   3.7                     3.5
66             The Gift (2015)                   3.7                     3.5
In [39]:
# Select all movies with a Fandango_Ratingvalue of 3.8 
# And make sure that you rounded this to 4.0
#Save as Q22B_3

# Q22A['Fandango_Ratingvalue'] = 3.8                  #use this to set rounded values
                   

Q22B_3 = Q22A[Q22A['Fandango_Ratingvalue'] == 3.8]
Q22B_3['Fandango_Stars_Revised'] = 4.0                  #set 3.8 to 4.0
Q22B_3 = Q22B_3[['FILM','Fandango_Ratingvalue','Fandango_Stars_Revised']]

print(Q22B_3)
                  FILM  Fandango_Ratingvalue  Fandango_Stars_Revised
77     Vacation (2015)                   3.8                     4.0
78      Chappie (2015)                   3.8                     4.0
80  Paper Towns (2015)                   3.8                     4.0
81     Big Eyes (2014)                   3.8                     4.0
83    Self/less (2015)                   3.8                     4.0

Q23: Calculate the sum for the Fandango_Difference column and call this Q23. Round to 2 decimals.

In [40]:
#Calculate the sum for the Fandango_Difference column and call this Q23. 
#Round to 2 decimals

Q23 = Q22A.Fandango_Difference.sum().round(2)
Q23
Out[40]:
35.6

Q24:

  • Create a new Series called ser by subtracting Fandango_Ratingvalue from Fandango_Stars_Revised from the Q22A DataFrame.
  • Calculate the sum of ser and call this Q24. Round to 2 decimals.
In [41]:
#### Will revise to match with code Grade Answer ###
#Create a new Series called ser by subtracting Fandango_Ratingvalue from Fandango_Stars_Revised from the Q22A DataFrame

ser = pd.Series(Q22B['Fandango_Stars_Revised'] - Q22B['Fandango_Ratingvalue'])
ser
Out[41]:
0      0.0
1      0.0
2      0.0
3      0.0
4      0.0
      ... 
141    0.0
142    0.0
143    0.0
144    0.0
145    0.0
Length: 146, dtype: float64
In [42]:
#Calculate the sum of ser and call this Q24. Round to 2 decimals

Q24 = ser.sum().round(2)
Q24
Out[42]:
-0.4

You should see that the sum for ser is much less than the sum of the Fandango_Difference column. This makes sense as our Fandango_Stars_Revised column rounded to the closest half star instead of always rounding up like was done with the Fandango_Stars column as discussed in the 538 article.

Q25: Set the index of fandango as the FILM column and sort the index in ascending order. Save this as Q25.

In [43]:
# Set the index of fandango as the FILM column and sort the index in ascending order
# Save this as Q25

fandango = fandango.set_index('FILM')                 #Index df

fandango = fandango.sort_values(by='FILM', ascending=True)
Q25 = fandango
Q25 
Out[43]:
RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm Metacritic_user_nom IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
FILM
'71 (2015) 97 82 83 7.5 7.2 3.5 3.5 4.85 4.10 4.15 3.75 3.60 5.0 4.0 4.0 4.0 3.5 60 24116 192 0.0
5 Flights Up (2015) 52 47 55 6.8 6.1 4.0 3.6 2.60 2.35 2.75 3.40 3.05 2.5 2.5 3.0 3.5 3.0 6 2174 79 0.4
A Little Chaos (2015) 40 47 51 7.0 6.4 4.0 3.9 2.00 2.35 2.55 3.50 3.20 2.0 2.5 2.5 3.5 3.0 7 4778 83 0.1
A Most Violent Year (2014) 90 69 79 7.0 7.1 3.5 3.4 4.50 3.45 3.95 3.50 3.55 4.5 3.5 4.0 3.5 3.5 133 32166 675 0.1
About Elly (2015) 97 86 87 9.6 8.2 4.0 3.6 4.85 4.30 4.35 4.80 4.10 5.0 4.5 4.5 5.0 4.0 23 20659 43 0.4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
What We Do in the Shadows (2015) 96 86 75 8.3 7.6 4.5 4.3 4.80 4.30 3.75 4.15 3.80 5.0 4.5 4.0 4.0 4.0 69 39561 259 0.2
When Marnie Was There (2015) 89 89 71 6.4 7.8 4.5 4.1 4.45 4.45 3.55 3.20 3.90 4.5 4.5 3.5 3.0 4.0 29 4160 46 0.4
While We're Young (2015) 83 52 76 6.7 6.4 3.0 2.9 4.15 2.60 3.80 3.35 3.20 4.0 2.5 4.0 3.5 3.0 65 17647 449 0.1
Wild Tales (2014) 96 92 77 8.8 8.2 4.5 4.1 4.80 4.60 3.85 4.40 4.10 5.0 4.5 4.0 4.5 4.0 107 50285 235 0.4
Woman in Gold (2015) 52 81 51 7.2 7.4 4.5 4.4 2.60 4.05 2.55 3.60 3.70 2.5 4.0 2.5 3.5 3.5 72 17957 2435 0.1

146 rows × 21 columns

Good work!

There is a lot more analysis that could be done with this data. For further practice, we would encourage you to think of other questions and see if you can answer it with this data. This would also be great practice for producing various Matplotlib and Seaborn plots to further analyze the data.